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A HIERARCHICAL MAX-STABLE SPATIAL MODEL FOR 
EXTREME PRECIPITATION^ 

By Brian J. Reich and Benjamin A. Shaby 
North Carolina State University and University of California — Berkeley 

Extreme environmental phenomena such as major precipitation 
events manifestly exhibit spatial dependence. Max-stable processes 
are a class of asymptotically-justified models that are capable of 
representing spatial dependence among extreme values. While these 
models satisfy modeling requirements, they are limited in their utility 
because their corresponding joint likelihoods are unknown for more 
than a trivial number of spatial locations, preventing, in particular, 
Bayesian analyses. In this paper, we propose a new random effects 
model to account for spatial dependence. We show that our specifica- 
tion of the random effect distribution leads to a max-stable process 
that has the popular Gaussian extreme value process (GEVP) as a 
limiting case. The proposed model is used to analyze the yearly max- 
imum precipitation from a regional climate model. 

1. Introduction. Spatial statistical techniques are crucial for accurately 
quantifying the likelihood of extreme events and monitoring changes in their 
frequency and intensity. Extreme events are by definition rare, therefore, 
estimation of local climate characteristics can be improved by borrowing 
strength across nearby locations. While methods for univariate extreme data 
are well developed, modeling spatially-referenced extreme data is an active 
area of research. Max-stable processes [de Haan and Ferreira (2006)] are 
the natural infinite-dimensional generalization of the univariate generalized 
extreme value (GEV) distribution. Just as the only limiting distribution 
of the scaled maximum of independent univariate random variables is the 
GEV, the scaled maximum of independent copies of any stochastic process 
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can only converge to a max-stable process. Max-stable process models for 
spatial data may be constructed using the spectral representation of de Haan 
(1984). Max-stable processes built from this representation were first used for 
spatial analysis by Smith (1990). Since then, a handful of subsequent spatial 
max-stable process models have been proposed, notably that of Schlather 
(2002) and Kabluchko, Schlather and de Haan (2009), who proposed a more 
general construction that includes several other known models as special 
cases. Applications of spatial max-stable processes include Coles (1993), 
Buishand, de Haan and Zhou (2008), and Blanchet and Davison (2011). 

Because closed-form expressions for the likelihoods associated with spatial 
max-stable processes are not available, parameter estimation and inference 
is problematic. Taking advantage of the availability of bivariate densities, 
Padoan, Ribatet and Sisson (2010) suggest maximum pairwise likelihood es- 
timation and asymptotic inference based on a sandwich matrix (composed 
of expected derivatives of the composite likelihood function) to properly 
account for using a pairwise likelihood when computing standard errors 
[Godambe and Heyde (1987)]. Recently, Genton, Ma and Sang (2011) ex- 
tend this approach using composite likelihood based on trivariate densities. 
The problem of spatial prediction, conditional on observations, for max- 
stable random fields (analogous to Kriging for Gaussian processes) has also 
proven difficult. The recent conditional sampling algorithm of Wang and 
Stoev (2010a) is capable of producing both predictions and prediction stan- 
dard errors for most spatial max-stable models of practical interest, subject 
to discretization errors that can be made arbitrarily small. 

Bayesian estimation and inference for max-stable process models for spa- 
tial data on a continuous domain has been elusive. Implementing these mod- 
els in a fully-Bayesian framework has several advantages, including incorpo- 
ration of prior information and natural uncertainty assessment for model pa- 
rameters and predictions. Approximate Bayesian methods based on asymp- 
totic properties of the pairwise likelihood function are possible. Ribatet, 
Cooley and Davison (2012) use an estimated sandwich matrix to adjust 
the Metropolis ratio within an MCMC sampler, while Shaby (2012) rotates 
and scales the MCMC sample post-hoc and Smith and Stephenson (2009) 
use pairwise likelihoods without adjustment. Bayesian models that are not 
based on max stable processes have been used for analysis of extreme values 
with spatial structure. Cooley, Nychka and Naveau (2007) uses a hierarchi- 
cal model with a conditionally-independent generalized Pareto likelihood, 
incorporating all spatial dependence through Gaussian process priors on 
the generalized Pareto likelihood parameters. Spatial dependence has also 
been achieved through Bayesian Gaussian copula models [Sang and Gelfand 
(2010)] and through a more flexible copula based on a Dirichlet process 
construction [Fuentes, Henry and Reich (2012)]. 

We develop a new hierarchical Bayesian model for analyzing max-stable 
processes. The responses are modeled as independent univariate GEV con- 



A MAX-STABLE MODEL FOR EXTREME PRECIPITATION 



3 



ditioned on spatial random effects with positive stable random effect distri- 
bution. Positive stable random effects have been used to model multivari- 
ate extremes with finite dimensions [Fougeres, Nolan and Rootzen (2009), 
Stephenson (2009)]. We extend this approach to accommodate data on a 
continuous spatial domain. We show that the resulting model is max-stable 
marginally over the random effects, and that a limiting case of this construc- 
tion provides a finite-dimensional approximation to the well-known Gaus- 
sian extreme value process (GEVP) of Smith (1990), often referred to as the 
"Smith process." Lower-dimensional representations have previously been 
proposed for high-dimensional extremes in various settings [Pickands (1981), 
Deheuvels (1983), Schlather and Tawn (2002), Ehlert and Schlather (2008), 
Wang and Stoev (2010a,b), Oesting, Kabluchko and Schlather (2012), En- 
gelke, Kabluchko and Schlather (2011)]. Our construction permits analysis 
of the joint distribution of all observations, and thus can produce straight- 
forward predictions at unobserved locations. Because we use a hierarchical 
model to represent the spatial max-stable process, a Bayesian implementa- 
tion is a natural choice. This allows us to model underlying marginal struc- 
tures as flexibly as we like, in addition to automatic pooling of information 
and uncertainty propagation. Also, the proposed framework permits rep- 
resenting the the spatial process using a lower-dimensional representation, 
which leads to efficient computing for large spatial data sets. 

The remainder of the paper proceeds as follows. Section 2 describes the 
model, which is compared to the GEVP in Section 3. The method is eval- 
uated using a simulation study in Section 4. In Section 5 we use the pro- 
posed method to analyze yearly maximum precipitation using regional cli- 
mate model output from the North American Regional Climate Change 
Assessment Program (NARCCAP) in the eastern US. Section 6 concludes. 

2. The hierarchical max-stable process model. 

2.1. Spatial random effects model. Let ^(s) be the extreme value at lo- 
cation s, defined over the region s € P C 'R?. Here we describe a max-stable 
model for ^(s) assuming that it is a block-maximum, that is, the maxi- 
mum of many observations taken at location s, such as the yearly maximum 
of daily precipitation levels. However, we note that max-stable models are 
increasingly being used to model extreme individual observations using a 
points above threshold approach [Huser and Davison (2012)], and the resid- 
ual max-stable process model described here may be applicable to this type 
of analysis as well. We describe a model for a single realization of the process 
and extend to multiple independent realizations in Section 2.3. 

Assuming the process is max-stable, then the marginal distribution of 
y(s) is GEV[^(s),it(s),^(s)], where //(s) is the location, a{s) > is the 
scale, and ^(s) is the shape (GEV distribution is described in Appendix A.l). 
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Equivalently [Resnick (1987)], we may express Y{s) = fj,{s) + ||^[X(s)5(^) — 
1], where X{s) is the residual max-stable process with unit Frechet margins, 
that is, X{s) ~ GEV(1, 1, 1). To allow for both nonspatial and spatial resid- 
ual variability, we model ^(s) as the product ^(s) = U{s)6(s). Borrowing a 
term from geostatistics, we refer to U{s) as the nugget effect since it accounts 
for nonspatial variation due to measurement error or other small-scale fea- 
tures. The nugg et is modeled as U{s) ~ GEV(1 , a,a), where, as described 
in detail below, the parameter a G (0, 1) controls the relative contribution 
to the nugget effect. 

Residual spatial dependence is captured by ^(s). We express the spatial 
process as a function of a linear combination of L kernel basis functions 
wi{s) > 0, scaled so that '^iLiWi{s) = 1 for all s. The spatial process is 
9{s) = Aw'z(s)^/"]", where Ai are the basis function coefficients. To 

ensure max-stability and Prechet marginal distributions, the random effects 
Ai follow the positive stable distribution with density p{A\a) which has 
Laplace transformation exp{— At)p{A\a) dA = exp(— i"). We denote this 
as Ai ~ PS(a). Although p{A\a) has no closed form, it possesses the essential 

feature that if Ai,..., At ''^ ' PS (a), then (^i + • • • + AT)/rV" ^ PS (a). 
Appendix A. 2 verifies that this model for X{s) is max-stable with unit 
Frechet marginal distributions. 

Marginalizing over the nugget terms U (s) gives the hierarchical model 

y(s,)|Ai, . ..,Al GEV[^r (s,), a*(sO,r(s.)], 

(2.1) 

Ai PS(a), 

where fi*{s) = fi{s) + ||f|[e(s)«(^) - 1], a*{s) = aa{s)9{s)^(-^\ and ^{s) = 

a^(s). The responses are conditionally independent given the random ef- 
fects A. The effect of conditioning on A = (^i, . . . ,Al)'^, and thus the spa- 
tial process 0, is to move spatial dependence from the residuals to a random 
effect in the GEV parameters. Marginalizing over the random effects induces 
spatial dependence. The joint distribution function of the residual process 
X at n locations Si, . . . , s„ is 

(2.2) P(X(si) < Q, i = 1, . . . , n) = expj - ^ 

I 1=1 

Therefore, although this is a process model defined on a continuous spatial 
domain, the finite-dimensional distributions are multivariate GEV (MGEV) 
with asymmetric logistic dependence function [Tawn (1990)]. 

Spatial dependence is often summarized by the extremal coefficient [Smith 
(1990)]. The pairwise extremal coefficient ??(sj,Sj) € [1,2] is defined by the 
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Fig. 1. Random draws on a 50 x 50 grid from the random effect model for various a. 
Each sample has a knot at each data point, GEV parameters ^(s) = log[(j(s)] = and 
^(s) = —0.1, and bandwidth r = 2. 



relationship 

(2.3) P{X{si) < c,X(sj) < c) = P{X{si) < cf'^'^'''\ 

If X{si) and X{sj) are independent, then ^^(sjjSj) = 2; in contrast, if X{si) 
and X{sj) are completely dependent, then T?(sj,Sj) = 1. The extremal coef- 
ficient introduced by (2.1) is 

L 

(2.4) i?(si,s,) = + wiis.Yh'". 

1=1 

Therefore, the extremal coefficient is the sum (over the L kernels) of the 
^1/" norms of the vectors [wi{si),wi{sj)]. 

To see how a controls the nugget effect, consider two observations at the 
same location, Sj = Sj. The two observations share the same kernels, wi{si) = 
wi{sj) and thus 9{si) = 0{sj), but have different nugget terms C/(sj) 7^ U{sj). 
In this case, the extremal coefficient is 2". If a = 1, then the nugget dom- 
inates and i9(sj,Sj) = 2 for all pairs of locations, regardless of their spatial 
locations [since ^iLiWi{s) = 1 for all s]. If a = 0, then 19(84, Sj) = 1 when 
Sj = Sj, and there is no nugget effect. The characteristics of the model are 
shown graphically in Figure 1. In these random draws from the model, we 
see the process is very smooth for a = 0.1 and has little discernable spatial 
pattern with a = 0.9. 

The parameter a clearly plays an important role in this model. It deter- 
mines the magnitude of the nugget effect, the form of spatial dependence 
function in (2.4), and the shape and scale of the conditional distributions in 
(2.1). To illustrate the links between the contribution of a to these aspects of 
the model, we consider the extreme cases with q = and q = 1. With a = 1, 
p(A|a) concentrates its mass on A = 1, and thus ^(s) = Xl^i ^z(s) = 1. In 
this case, the conditional and marginal GEV parameters are the same, for 
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example, /x*(s) = ^(s), there is no residual dependence with 9{si, sj) = 2, and 

thus y(s) GEV[/u(s),(7(s),^(s)]. On the other hand, if q 0, then the 
conditional scale cr*{s) ~ and l^(s) ~ /^*(s), a continuous spatial process 
with strong small-scale spatial dependence ^(s,s) ~ 1. 

Spatial prediction (analogous to Kriging) at a new location s* is straight- 
forward under this hierarchical model. Predictions are made by simply com- 
puting ^(s*) = [X^^i ^/W/(s*)^/"]", and then sampling Y{s*) from the in- 
dependent GEV in (2.1). Repeating this at every MCMC iteration gives 
samples from the posterior predictive distribution of ^(s*). 

2.2. Kernel and knot selection. Although other kernels are possible, we 
use a scaled version of the Gaussian kernel 



(2.5) K{s\\-i,t) = ^^exp 



-^{s-vif{s-vi) 



where vi, . . . , vj;, € TZ^ are spatial knots and r > is the kernel bandwidth. 
To ensure that the kernels sum to one at each location, the kernels are scaled 
as 



K(s\vi,t) 



(2.6) M^) = ^L I 

The knots are taken as a fixed and regularly-spaced grid of points covering 
the spatial domain. Section 3 shows that this choice of kernel function and 
knot distribution gives the GEVP as a limiting case. Even with a regular 
grid of knots, the extremal coefficient is nonstationary, that is, 'i9(sj,Sj) is 
not simply a function of ||sj — Sj||. For example, w{si) may not equal w{sj) 
if Sj is close to a knot and Sj is not. This discretization artifact dissipates 
for large L. 

While the extremal coefficient does not fully characterize spatial depen- 
dence, it is useful for guiding knot selection. Knot selection poses a trade-off 
between computational burden with too many knots and poor fit with too 
few knots. Consider the case of a Gaussian kernel with bandwidth r = 1 and 
knots on a large rectangular grid with grid spacing d. Figure 2 plots the 
extremal coefficient for points (0,0) and (0,/i) as a function of separation 
distance h. The extremal coefficient has nearly an identical shape for all d 
less than or equal to r. For d = 1.25r, the extremal coefficient differs slightly 
from the fine grids, and for d > 1.25r the extremal coefficient deviates con- 
siderably from the fine grids, especially for small a. These results will scale 
for other r, therefore, a rule of thumb is to select the knots so that the grid 
spacing is approximately equal to the kernel bandwidth. Knot selection is 
discussed further in Section 4. 
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Fig. 2. Extremal coefficient ■&{s\,S2), where si = (0,0), S2 = {0,h), the kernel bandwidth 
is T = 1, and the knots form a rectangular grid with grid spacing d. 

2.3. Adaption for the NARCCAP data. In Section 5 we analyze climate 
model output for T > 1 years, which requires additional notation. Denote 
Yt(s) as the response for year t and site s. Assuming the years are indepen- 
dent and identically distributed (over years, not space) gives 

Yt{s,)\A^t, ....Au GEV[/z,*(s,),cTr(s,),r (s^)], 

(2.7) 

Au PS(a), 

where 0t(s) = [X^^i ^ittf«(s)^^°]° is the spatial random effect for year t, 
/x;(s) = /z(s) + |g}(0t(s)€(^) - 1), a,*(s) = aa(s)0i(s)5(^), and r(s) = ^^(s). 
Note that while the GEV parameters conditioned on 0t(s) in (2.7) vary by 
year, marginally, lt(s) ^ GEV[^(s), (t(s), ^(s)] for all t. 

Gaussian process priors are used for the GEV parameters /u(s), 7(3) = 
log[(T(s)], and ^(s). The Gaussian process [i has mean x(s)-^/3^, where x(s) 
includes the spatial covariates such as elevation. The spatial covariance of 
/i is Matern [Banerjee, Carlin and Gelfand (2004), Cressie (1993), Gelfand 
et al. (2010)] with variance 5^ > 0, range > 0, and smoothness z/^ > 0. The 
other GEV parameters 7(3) and ^(s) are modeled similarly. In some appli- 
cations, it may also be desirable to allow for the GEV parameters to evolve 
over time, perhaps following a separate linear time trend at each location, 
which would be a straightforward modification of this model. The MCMC 
algorithm used to sample from this model is described in Appendix A. 3. 

3. Connection with the Gaussian extreme value process. The GEVP of 
Smith (1990) is a well-known spatial max-stable process. In this section we 
show that the proposed positive stable random effects model in Section 2 
contains this model as a limiting case. The GEVP construction for the resid- 
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ual process is 

(3.1) X{s) = max{/iiK(s|ui, S), h2K{s\u2, S), . . .}, 

where {(hi,ui), (/i2,U2), . . .} follows aPoisson process with intensity A(/i,u) = 
h~'^I{h > 0), and is a kernel function standardized so that / if(s|u, r) du = 
1 for all s. The construction (3.1) is a special case of the de Haan [de Haan 
(1984)] spectral representation. A useful analogy is to think of X{s) as the 
maximum rainfall at site s, generated as the maximum over a countably- 
infinite number of storms. The fcth storm has center € TZ'^, intensity 
hk > 0, and spatial range given by K{s\\ik,T). 

Under this model, the joint distribution at locations si, . . . ,s„ is 



/ max< > au 

J i [ Ci J 



(3.2) P[X(si)<ci,...,X( 

^n) ^ — exp 
The GEVP has extremal coefficient 

(3.3) ??(sj,Sj) = J max{K{si\u,T,), K{sj\u,T;)} du, 

which simplifies to ^{si,Sj) = 2<1>( ^^^'^r^^^^ ) the Gaussian kernel (2.5). This 
does not include a nugget effect, since i?(sj,Sj) = 1 if ||sj — Sj|| = 0. 

The connection to the model in Section 2 is made by restricting the storm 
locations to the set of L knot locations {vi, . . . ,vl} and rescaling the kernels 
to sum to one as in (2.6), giving 

(3.4) X{s) =mayi{hiWi{s),...,hLWL{s)}. 

This amounts to truncating the de Haan spectral representation. If hi ~ 
GEV(1, 1, 1), then X{s) is max-stable with joint distribution 



(3.5) P[X(si)<ci,...,X( 

^n) ^ Cfi] — exp 



> max< 

I Ci 



1=1 



which implies that the marginal distributions are unit Frechet. For equally- 
spaced knots, this distribution converges weakly to the full GEVP distribu- 
tion function (3.2) as L increases. We note that this finite approximation 
could be applied to other max-stable models such as those in Schlather 
(2002) and Kabluchko, Schlather and de Haan (2009) by allowing the func- 
tions K to be suitably scaled Gaussian processes, unlike the current approach 
where if is a kernel function. 

Using the model described by (3.5) directly is problematic because it may 
not yield a proper likelihood. The process (3.4) at n locations {X(si),..., 
X{sn)} is completely determined by the intensities {hi, . . . , h^}. Therefore, 
the likelihood for {X{si ), . . . , X(s„)} requires a map from {X(si), . . . , X(s„)} 
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to {hi, . . . jhi}. This map may not exist, for example, if L < n, and gener- 
ally does not have a closed form. This is common in dimension reduction 
methods for Gaussian process models [e.g., Higdon, Swall and Kern (1999), 
Banerjee et al. (2008), and Cressie and Johannesson (2008)]. 

As with the Gaussian process dimension reduction methods, the model in 
Section 2 includes both a spatial process (0) and a nonspatial nugget term 
(U). Comparing (2.2) and (3.5), we see a result of the nugget effect is that the 
L°° norm (the maximum) in (3.5) is replaced with the L^/" norm, and that 
(2.2) converges weakly to (3.5) as a goes to zero. Including a nugget aids 
in computation, as the likelihood becomes a simple product of univariate 
GEV densities. Including a nugget term also has advantages beyond compu- 
tation. The GEVP has been criticized as unrealistically smooth [Blanchet 
and Davison (2011)], and so a nugget may improve fit. Analogously, in the 
geostatistical literature for Gaussian data a nugget is not required, but is 
used routinely to account for small-scale phenomena that cannot be cap- 
tured with a smooth spatial process [Cressie (1993), Banerjee, Carlin and 
Gelfand (2004), Gelfand et al. (2010)]. 

4. Simulation study. In this section we conduct a simulation study to 
verify that the MCMC algorithm produces reliable results, to investigate 
sensitivity to knot selection, and to determine which parameters are the 
most difficult to estimate. Data and knots are placed on m x m regular 
grids covering [l,u] x [l,u], denoted S{m, l,u). For each simulation design, we 
generate data from the model described in Section 2.3 at the n = 49 locations 
5(7, 0, 6) and T = 10 independent years. The GEV location parameter varies 
by site following the Gaussian process with mean zero, variance one, and 
exponential spatial correlation exp(— ||sj — Sj||/2). Unlike the analysis of the 
NARCCAP data in Section 5, the GEV scale and shape parameters are 
assumed to be the same for all sites and fixed at cj(s) = 1 and ^(s) = 0.2. We 
fix these parameters in the simulation study for computational purposes, and 
because these spatially-varying parameters will likely be hard to estimate 
for these moderately-sized simulated data sets. The simulations vary by the 
nugget effect (a), the kernel bandwidth (r), and the number of knots used 
to generate the data (Lq). The simulation designs are numbered: 

(1) Lo = 49 knots at 5(7, 0, 6) , a = 0.3, r = 3, 

(2) Lo = 49 knots at 5(7,0,6), a = 0.7, r = 3, 

(3) Lo = 25 knots at 5(5, 0, 6) , a = 0.3, r = 3, 

(4) Lo = 25 knots at 5(5, 0, 6) , a = 0.7, r = 3, 

(5) Lo = 10,000 knots at 5(100, -1, 7), a = 0.4, r = 1. 

For the first four designs, the number of knots used to generate the data is 
small enough to permit fitting the model with the correct number of knots. 
We use these examples to explore sensitivity to knot selection. The final 
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design with Lq = 10,000 knots represents the limiting case with more knots 
than can be fit computationally. Here we fit several course grids of knots and 
compare performance as the number of knots increases to provide recom- 
mendations on the number of knots needed to provide a good approximation 
to the limiting process. 

M = 50 data sets are generated for each simulation design. For each sim- 
ulated data set, we fit the model with a varying number of knots. For 
the first four designs we compare L = 25 knots at 5(5,0,6) and L = 49 
knots at 5(7,0,6) to compare fits with the true knots and either too few 
(L = 25 for designs 1 and 2) or too many {L = 49 for designs 3 and 4) 
knots. For the final design we compare fits with 8 knot grids: L = 25 knots 
at 5(5, — 1, 7), . . . , L = 144 knots at 5(12, —1, 7). The spatial covariance pa- 
rameters for the GEV location have priors (5^ ~ InvGamma(0.1, 0.1) and 
range ~ InvGamma(0.1, 0.1); for this relatively small spatial domain we 
fix the smoothness parameter z/^ = 0.5, giving an exponential covariance. 
The design matrix X includes only the intercept with f3^ ~ N(0, 100^). The 
GEV log scale and shape are constant across space and have N(0, 1) and 
N(0,0.252) priors, respectively. The residual dependence parameters have 
priors r InvGamma(0.1, 0.1) and Q~Unif(0,l). 

The results are presented in Figure 3. For each data set, we compute 
the posterior mean of the GEV parameters at each location and the mean 
squared error (MSE) of the posterior means (averaged over the n sites for 
the spatially- varying GEV location) . Figure 3 plots the M = 50 root MSEs 
and coverage probabilities (averaged over sites for the GEV location). 

For the data generated with Lq = 25 or Lq = 49 knots in Figure 3(a), 
the coverage probabilities are generally near the nominal level. With L = 49 
knots, the coverage probabilities range from 0.90 to 0.94 for the GEV loca- 
tion. For the first two designs, the model with L = 25 knots has fewer knots 
than were used to generate the data. This does not have a substantial im- 
pact on the estimation of the GEV location. However, using too few knots 
leads to increased RMSE and under-coverage for the GEV log scale, espe- 
cially for design 1 with strong spatial dependence. For simulation designs 3 
and 4, the model with L = 49 knots has nearly twice as many knots than 
were used to generate the data. In these cases, the L = 49 model performs 
nearly as well as the correct L = 25 model. For these simulation settings, 
we conclude that using too few knots can lead to poor results, especially for 
the scale parameter, but that including too many knots does not degrade 
performance. 

For the data generated with Lq = 10,000 knots in Figure 3(b), we use 
knots grids with L = 25, 36, . . . , 144 points. For comparison with the kernel 
bandwidth, rather than plotting the results by L, we plot results by the 
spacing between adjacent knots in the same column or row, which ranges 
from 0.70 for L = 144 to 2.00 for L = 25. The coverage probabilities are near 
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(b) RMSE by grid spacing for simulation design 5 
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Fig. 3. Boxplots of root mean squared error (RMSE) for the GEV parameters in the 
simulation study. The horizontal lines in the boxplots are the 0.05, 0.25, 0.50, 0.75, and 
0.95 quantiles of RMSE. Coverage percentages of posterior 95% intervals are given below 
the boxplots. 



or above the nominal level for all grid spacings at or below the bandwidth, 
r = 1.0, and the RMSE appears to be fairly constant for all grid spacings at 
least as small as the bandwidth. Therefore, this appears to be a reasonable 
rule of thumb for selecting the number of knots. 

We also computed RMSE for the spatial dependence parameters a and 
T (not shown in Figure 3) for this final case. For a, the average (over data 
sets) RMSE was 0.049 (coverage percentage 96%), 0.060 (88%), and 0.101 
(40%) for grid spacings 0.7, 1.0, and 2.0, respectively. For r, the average 
RMSE was 0.102 (88% ), 0.107 (90%), and 0.233 (38%) for grid spacings 0.7, 
1.0, and 2.0, respectively. As with the GEV parameters, the approximation 
with the grid spacing at least as small as the bandwidth appears to provide 
reasonable estimation of the spatial dependence parameters. When too few 
knots are used, the bandwidth is often overestimated to compensate for the 
lack of knots and, thus, RMSE is high and coverage is far below the nominal 
level. 
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5. Analysis of regional climate model output. To illustrate the proposed 
method, we analyze climate model output provided by the North American 
Regional Climate Change Assessment Program (NARCCAP). Our objective 
is to study changes in extreme precipitation under various climate scenarios 
in different spatial regions while accounting for residual spatial dependence 
remaining after allowing for spatially- varying GEV parameters. The data are 
downloaded from the website http : / / www . narccap . ucar . edu/ index . html. 
We analyze output from two timeslice experiments. Both runs use the Geo- 
physical Fluid Dynamics Laboratory's AM2.1 climate model with 50 km 
resolution. The model is run separately under historical (1969-2000) and 
future conditions (2039-2070). Observational data is used for the sea-surface 
temperature and ice boundary conditions in the historical run. The bound- 
ary conditions for the future run are perturbations of the historical bound- 
ary conditions. The amount of perturbation is based on a lower resolution 
climate model. The perturbations assume the A2 emissions scenario [Na- 
kicenovic et al. (2000)], which increases CO2 concentration levels from the 
current values of about 380 ppm to about 870 ppm by the end of the 21st 
century. 

We analyze data for n = 697 grid cells in eastern US as shown in Figure 4. 
For grid cell i with location Sj and year t, we take the annual maximum of 
the daily precipitation totals as the response, i^(sj). NARCCAP provides 
eight 3-hour precipitation rates each day, and we compute the daily to- 
tal by summing these eight values and multiplying by three. To explore 
the form of residual spatial dependence, we use the madogram [Cooley, 
Naveau and Poncet (2006)] function in the SpatialExtremes package in R 
(www.r-project.org). The madogram converts the observations at each site 




Fig. 4. Grid cell centers for the NARCCAP output (left) and madogram extremal coeffi- 
cient estimate (box width is proportional to the number of observations) for the historical 
run. 
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to have unit Frechet margins using a rank transformation, and then esti- 
mates the pairwise extremal coefficients. Figure 4 plots the estimated ex- 
tremal coefficients against ||sj — Sj||. This plot clearly shows residual spatial 
dependence. 

The data from the two runs are analyzed separately using the model 
described in Section 2. We assume that the process is stationary in time 
during each period, that is, the GEV marginal density at each location 
is constant over time in each simulation period. We use n = L terms with 
knots at the data points si, . . . , s„. The residual dependence parameters have 
priors r ~ InvGamma(0.1, 0.1) and a~Unif(0,l). For both scenarios, all 
three GEV parameters vary spatially following Gaussian process priors. The 
covariates for the mean of the GEV parameters, x(s), include the intercept, 
grid cell latitude, longitude, elevation, and log elevation. The elements of 
f3^ have independent N(0, 100^) priors. The spatial covariance parameters 
have priors 6j InvGamma(0.1, 0.1), range pj ~ InvGamma(0.1, 0.1), and 
smoothness i^j ~ InvGamma(0.1, 0.1) for j € 7,^}- 

Figure 5 shows the estimated GEV parameters for the historical simu- 
lation. The estimated location and log scale parameters are highest in the 
southeast. The posterior mean of the GEV shape is generally positive, in- 
dicating a right-skewed distribution with no upper bound. The estimated 
shape is the largest in Florida. Comparing the posterior means and standard 
deviations, there is evidence that all three GEV parameters vary spatially. 
Figure 6 shows that there is strong positive dependence between the shape 
and scale as one might expect, since for shape in (0, 0.5) both the mean and 
variance of GEV includes the ratio of the scale and shape. For locations with 
large shapes, there is a negative dependence with the log scale. 

To formally assess the need for spatially- varying GEV parameters, we 
also refit the model for the historical simulation using the Bayesian vari- 
able selection prior of Reich et al. (2010) to test whether the variance 
6j equals small constant = 0.01^. The test is carried out using the 
mixture prior 6j = gjd* -|- (1 — gj)AQ, where gj Bernoulli(0.5) and 6*"^ ~ 
InvGamma(0.1, 0.1). The intuition behind this prior is that if gj = 1, then 
5j ~ InvGamma(0.1, 0.1) and the GEV parameter varies spatially; in con- 
trast, if go = 0, then 6j = Aj, and spatial variation after accounting for 
spatial covariates x is negligible. Therefore, the posterior mean of gj can be 
interpreted as the posterior probability that the jth GEV parameter varies 
spatially, which can be used to approximate the Bayes factor comparing 
these models. In the separate mixture prior fit, the posterior probability 
that the GEV parameters vary spatially was at least 0.99 for all three pa- 
rameters. 

We also aim to quantify changes in extreme quantiles. The gth quantile 
at location s is fi{s) + <t(s)[1 — log(l/g)~'>(®)]/,^(s), which is also called the 
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(a) GEV location, 
posterior mean 




(b) GEV log scale, 
posterior mean 




(c) GEV shape, 
posterior mean 




(d) GEV location, (c) GEV log scale, (f) GEV shape, 

posterior sd posterior sd posterior sd 



Fig. 5. Posterior mean and standard deviation of the GEV location, log scale, and shape 
parameters for the historical simulation. All units are mm/h. 




GE\/ log scali 



Fig. 6. Plot of the posterior mean GEV scale versus posterior mean GEV shape at each 
site. 
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(a) 0.10 quantile, (b) 0.50 quantile, (c) 0.95 quantile, 

posterior mean posterior mean posterior mean 




(d) 0.10 quantile, 
posterior sd 



(e) 0,50 quantile, 
posterior sd 



(f ) 0.95 quantile, 
posterior sd 



Fig. 7. Posterior mean and standard deviation of the 0.10, 0.50, and 0.95 quantiles for 
the historical simulation. All units are mm/h. 



1/(1 — q) year return level. Figure 7 plots the posterior of various pointwise 
quantile levels. The large location and scale parameters lead to large medians 
in the southeast, while the 0.95 quantile is the largest in Florida due to the 
large shape parameter. 

The difference between the historical and future scenarios is summarized 
in Figures 8 and 9. The estimated GEV location and log scale parameters 
are larger for the future scenario for the majority of the spatial domain. The 
increase is the largest in Alabama, Georgia, and New England. The shape 
parameter also shows an increase in Alabama, but statistically significant 
decrease in Florida. Figure 9(c) shows that these changes in GEV parameters 
lead to an increase in the 0.95 quantile for most of the spatial domain. With 
the exception of the midwest and southern Florida, the posterior probability 
of an increase in the 0.95 quantile is near one [Figure 9(d)], indicating that 
extremes have a different spatial pattern in the future scenario. 
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(a) GEV location, (b) GEV log scale, (c) GEV shape, 

estimated increase estimated increase estimated increase 




(d) GEV location, 
pro)] of an increase 



(e) GEV log scale, 
prob of an increase 



(f) GEV shape, 

prob of an increase 



Fig. 8. Posterior mean change from historical to future time simulation and the pos- 
terior probability that this change is positive for the GEV location, log scale, and shape 
parameters. All units are mm/h. 



Parameter estimates provide evidence of residual dependence: the pos- 
terior mean (standard deviation) of a is 0.483 (0.008) and the posterior 
mean of the spatial range r is 41.6 (0.4) kilometers. To illustrate the effects 
of failing to account for residual spatial dependence, we compare these re- 
sults with the model that ignores spatial dependence in the residuals, that 
is, sets a = 1. One effect of accounting for residual dependence is an in- 
crease in posterior variance for the GEV parameters. Figure 10 shows that 
the posterior variance often doubles as a result of including residual depen- 
dence. Therefore, while spatial modeling of the GEV parameters reduces 
uncertainty by borrowing strength across space compared to analyzing all 
sites completely separately, it appears that spatial modeling of the GEV 
parameters without accounting for residual dependence underestimates un- 
certainty. 
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(a) 0.10 quaiitile, (b) 0.50 quantile, (c) 0.95 quantile, 

estimated increase estimated increase estimated increase 




(d) 0.10 quantile, (e) 0.50 quantile, (f) 0.95 quantile, 

prob of an increase prob of an increase prob of an increase 



Fig. 9. Posterior mean change from historical to future time simulations and the poste- 
rior probability that this change is positive for the 0.10, 0.50, and 0.95 quantiles. All units 
are mm/h. 




(a) GEV location (b) GEV log scale (c) GEV shape 



Fig. 10. Ratio of the posterior variance of the GEV parameters for the models with and 
without residual spatial dependence. 
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6. Discussion. In this paper we propose a new modeling approach for 
spatial max-stable processes. The proposed model is closely related to the 
GEVP and permits a Bayesian analysis via MCMC methods. Applied to 
the climate data, we find statistically significant increases under the future 
climate scenario in the upper quantiles of precipitation for most of the spatial 
domain, with the largest increase in the southeast. 

The proposed hierarchical model opens the door for several exciting re- 
search directions. The model could be made even more flexible by chang- 
ing the form of the kernels. It should be possible to replace the Gaussian 
kernel with any other kernel that integrates to one, that is, any other two- 
dimensional density function. For large data sets, it may even be possible to 
estimate the kernel function nonparametrically from the data. Zheng, Zhu 
and Roy (2010) and Reich and Fuentes (2012) use Bayesian nonparamet- 
rics to estimate the spatial covariance function of a Gaussian process. This 
approach could be extended to the extreme data, using, say, a Dirichlet pro- 
cess mixture prior for the kernel function. The methods proposed in this 
paper could also be extended to more complicated dependency structures. 
For example, we have ignored the temporal dependence because the spa- 
tial association is far stronger than the temporal association for these data. 
However, using three-dimensional kernels (two for space, one for time) would 
give a feasible max-stable model for spatiotemporal data. 



A.l. Generalized extreme value (GEV) distribution. The GEV distri- 
bution has three parameters: location /x, scale cr > 0, and shape ^. If y ~ 
GEV(/i, cj, ^), then Y has distribution function P(Y < y) = exp[—t{y)] and 
density f{y) = ^t{y)^+^ ex.p[-t{y)], where 



The shape parameter determines the support, with Y £ (—oo, fi — a/S,] if 
^ < 0, Ye (-00,00) is C = 0, and Y £ [fi - a/C,oo) in ^ > 0. The GEV 
has three well-known subfamilies defined by the shape: the WeibuU (^ < 0), 
Gumbel (^ = 0), and Frechet (^ > 0) families. 

A.2. Properties of the random efTects model. Here we show that the 
hierarchical representation in (2.1) is max-stable and has GEV margins. 

GEV marginal distributions: Since the margins are identical for all loca- 
tions, we omit the notational dependence on s. The marginal distribution 
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function of X is 



P{X < c) 



P{X\A)p{A\a)dA 



exp 



a , 



p(A|a) dA 



(A.l) 



L 



L 



= JJexp{-(c ^/"ti;;"^^")"} = exp| --^-u;; j =exp(-l/c). 

This is the unit Prechet distribution function. 

Max- stability: The process is max-stable if for any set of locations {si, . . . , 
s„} and any t >0, Prob[X(si) <tci, . . . ,X(s„) <tCnf = Prob[X(si) <ci, . . . , 
x{sn)<Cn] [e.g., Zhang and Smith (2010)]. From (2.2), 



Prob[X(si) <tci,...,X(s„) <ten 



expi 



1=1 



E 

.i=l 



Wl{Si) 



tCi 



1/a 



a\ t 



I 1=1 li=l ^ 



'wiisi) 



1/a 



a ^ t 



= Prob[X(si) < Ci, . . . , X{Sn) < Cn]. 

A. 3. MCMC details. A comphcation that arises when using positive sta- 
ble random effects is that their density does not have a closed form. To 
overcome this problem, we use the auxiliary variable technique of Stephen- 
son (2009) for the asymmetric logistic MGEV. Stephenson (2009) introduces 
auxiliary variables Bi € (0, 1) so that 



(A.2) 



p{A,B\a) = c(S)exp[-c(S)^-"/(i"°)] 

1 — a 



where c(B) = r '^'"(°""^) ii/(i ") '^'"[(^ a)nB] ^ r^j^ marginally over Bi, Ai ~ 
PS(a). This marginalization is handled naturally via MCMC. Incorporating 
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the auxiliary variable gives 



Yt{s,)\Ait,Bu...,ALt,BLt 



indcp 



GEV[^,*(si),a;(s,),r(si)] 



(A.3) 



(Alt, Bit) 



i.i.d. 



p{A,B\a) 



which is the model fit to the data. 

We perform MCMC sampling for the model in (A.3) using R (http:// 
www.r-project.org/). The Metropolis within Gibbs algorithm is used to 
draw posterior samples. This begins with an initial value for each model 
parameter, and then parameters are updated one-at-a-time, conditionally 
on all other parameters. The GEV parameters //, o" = exp(7), and ^, spatial 
dependence parameters r and a, and auxiliary variables {Ai,Bi) are updated 
using Metropolis updates. To update the GEV location at site Sj for the rth 
MCMC iteration, we generate a candidate using a random walk Gaussian 
candidate distribution ^('^^(sj) N(;u('"~^)(sj), s^), where fj,^^~^\si) is the 
value at MCMC iteration r — 1 and s is a tuning parameter. The acceptance 
ratio is 



which is a function of the GEV likelihood of Yt{s) in (2.7), denoted as 
/[lt(s)|/i(s),exp[7(s)],^(s),0t(s)], as well as the full conditional prior of /i(sj) 
given ;u(sj) for all j ^i, p[/i(si)|/i(sj), j ^ i], which is found using the usual 
formula for the conditional distribution of a multivariate normal. The can- 
didate is accepted with probability min{i?, 1}. If the candidate is accepted, 
then fi^^\si) = fi^'^\si), otherwise the previous value is retained, fi^^\si) = 
^('■-i)(sj). The other GEV parameters 7(sj) and ^(sj) are updated similarly. 
GEV hyperparameters, such as (3^ and spatial covariance parameters, are 
updated conditionally on the GEV parameters and, thus, their updates are 
identical to the usual Bayesian geostatistical model. 

The spatial dependence parameters r and a and the auxiliary variables 
Alt and Bit are also updated using Metropolis sampling. These updates 
differ from ^(sj) only in their acceptance ratios. For computing purposes, 
we transform to 6 = log(T). The acceptance ratio for 5 is 



[ nLnr=W[yt(s,)|/x(s,),exp[7(s,)],e(s,),gf^(s,)] ] f p[6(^)] \ 

lnr=inr=in>^*(si)iM(sO,exp[7(s,)],e(s,),er"'^(s.)]J 



R = 



{ 




where 9t and 9^ are the values of Ot evaluated with t^'^^ = exp(5^'^^) 
and T^^~^^ = ex.p{6^^~^^), respectively, and p{6) is the log-gamma prior. The 
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acceptance ratio for a is 

nr=inr=in^i(s.)i/^(s.),exp[7(s,)],c(si),er"'^(s.)] 



I{0 < a(^) < 1 



We use a log-normal candidate distribution for ~ LN[log(A[[ ^^),s\], 

with density denoted ^^). The latent variables Au and Bu have 

acceptance ratios 

nr=in>^i(si)|/^(si),exp[7(s,)],C(s,),0f)(s,)] 



nr=i^[^t(s.)l/^(s.),exp[7(s,)],e(s^),^r"'^(s.)] 



p{At'\Bu\a)nqiAi^\At'^] 



for and 



for Sit. 

The standard deviations of all candidate distributions are adaptively 
tuned during the burn-in period to give acceptance rates near 0.4. Note 
that after the burn- in, the candidate distribution is fixed and this defines 
a stationary Markov chain and satisfies the usual mixing conditions, gen- 
erating samples from the true posterior distribution once convergence is 
reached. We generate two (one for the simulation study) chains of length 
25,000 samples and discard the first 10,000 samples of each chain as burn- 
in. Convergence is monitored using trace plots and autocorrelation plots for 
several representative parameters. 
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